Abstract
This chapter describes progress in building computer systems that understand people, and can work with them in the manner of an attentive human-like assistant. To accomplish this, I have built a series of real-time experimental testbeds, called Smart Rooms. These testbeds are instrumented with cameras and microphones, and perform audio-visual interpretation of human users. Real-time capabilities include 3D tracking of head, hands, and feet, and recognition of hand/body gestures. The system can also support face recognition and interpretation of face expression.
Introduction
My goal is to make it possible for computers to function like attentive, human-like assistants. I believe that the most important step toward achieving this goal is to give computers an ability that I call perceptual intelligence. They have to be able to characterize their current situation by answering questions such as who, what, when, where, and why, just as writers are taught to do.
In the language of cognitive science, perceptual intelligence is the ability to solve the frame problem: it is being able to classify the current situation, so that you know what variables are important, and thus can act appropriately. Once a computer has the perceptual intelligence to know who, what, when, where, and why, then simple statistical learning methods have been shown to be sufficient for the computer to determine which aspects of the situation are significant, and to choose a helpful course of action [205].